By addressing the limit of measurable fluorescent parameters due to instrumentation and spectral overlap, mass cytometry (CyTOF) combines heavy metal spectrometry to allow examination of up to (theoretically) 100 parameters at the single cell level. While spectral overlap is significantly less pronounced in CyTOF than flow cytometry, spillover due to detection sensitivity, isotopic impurities, and oxide formation can impede data interpretability. We designed (Cytometry dATa anALYSis Tools) to provide tools for (pre)processing and analysis of cytometry data, including compensation and in particular, an improved implementation of the single-cell deconvolution algorithm (1).
dbFrame classassignPrelim: Assignment of preliminary IDsestCutoffs: Estimation of separation cutoffsapplyCutoffs: Applying deconvolution parametersoutFCS: Output population-wise FCS filesplotYields: Selecting barcode separation cutoffsplotEvents: Normalized intensitiesplotMahal: All barcode biaxial plotTo demonstrate the debarcoding and compensation work-flow with CATALYST, we provide two sample data sets, sample_data.fcs and ss_beads.fcs. The former follows a 6-choose-3 barcoding scheme where mass channels 102, 104, 105, 106, 108, and 110 were used for labeling such that each of the 20 individual barcodes are positive for exactly 3 out of the 6 barcode channels. Accompanying this, the provided sample_key.csv contains a binary code of length 6 for each sample, e.g. 111000, as its unique identifier. The data in ss_beads.fcs was obtained from 36 single-antibody stained samples. Herefor, beads were stained with antibodies captured by mass channels 139, 141 through 156, and 158 through 176, respectively, and pooled together. Note that, to decrease running time, we sampled 1’000 events of each population at the cost of not necessarily arriving at biologically meaningful results.
CATALYST provides three functions for debarcoding and three visualizations that guide selection of thresholds and give a sense of barcode assignment quality.
In summary, events are assigned to a sample when i) their positive and negative barcode populations are separated by a distance larger than a threshold value and ii) the combination of their positive barcode channels appears in the barcoding scheme. Depending on the supplied scheme, there are two possible ways of arriving at preliminary event assignments:
All data required for debarcoding are held in objects of class dbFrame (see section below), allowing for the following easy-to-use work-flow:
assignPrelim will return a dbFrame containing the input measurement data, barcoding scheme, and preliminary event assignments.applyCutoffs. Optionally, population-specific separation cutoffs may be estimated by running estCutoffs prior to this.plotYields, plotEvents and plotMahal aim to guide selection of devoncolution parameters and to give a sense of the resulting barcode assignment quality.dbFrame with outFCS.dbFrame classData returned by and used throughout debarcoding are stored in a debarcoding frame. There are various parts of the data:
flowFrame specified in assignPrelim to the exprs slot.bc_key slot is a binary matrix with numeric masses as column names and sample names as row names. If supplied with a numeric vector of masses, assignPrelim will internally generate a concurrent representation.bc_ids is a numeric or character vector of the ID assignments that have been made. If a given event’s population separation falls below its separation cutoff, or above the population’s Mahalanobis distance cutoff, it will be give ID 0 for “unassigned”. Assignments can be manipulated with bc_ids<-.deltas slot contains for each event the separations between positive and nergative populations, that is, between the lowest positive and highest negative intesity.normed_bcs are the barcode intensities normalized by population. Here, each event is scaled to the 95% quantile of the population it’s been assigned to. sep_cutoffs are applied to these normalized intensities.sep_cutoffs and mhl_cutoff contain the devoncolution parameters. These can be specified by standard replacement via sep_cutoffs<- and mhl_cutoff<-.counts and yields are matrices of dimension (# samples)x(101). Each row in the counts matrix contains the number of events within a sample for which positive and negative populations are separated by a distance between in [0,0.01), …, [0.99,1], respectively. The percentage of events within a sample that will be obtained after applying a separation cutoff of 0, 0.01, …, 1, respectively, is given in yields.As a brief overview hereof, show(dbFrame) will display
sep_cutoffs are specified)assignPrelim: Assignment of preliminary IDsThe debarcoding process commences by assigning each event a preliminary barcode ID. assignPrelim thereby takes either a binary barcoding scheme or a vector of numeric masses as input, and accordingly assigns each event the appropirate row name or mass as ID. FCS files are read into R with read.FCS of the flowCore package, and are represented as an object of class flowFrame:
fcs_file <- system.file("extdata/sample_data.fcs",
package="CATALYST")
library(flowCore)
sample_ff <- read.FCS(fcs_file)
The debarcoding scheme should be a binary table with sample IDs as row and numeric barcode masses as column names:
csv_file <- system.file("extdata/sample_key.csv",
package="CATALYST")
sample_key <- read.csv(csv_file, row.names=1, check.names=FALSE)
sample_key
## 102 104 105 106 108 110
## A1 1 1 1 0 0 0
## A2 1 1 0 1 0 0
## A3 1 1 0 0 1 0
## A4 1 1 0 0 0 1
## A5 1 0 1 1 0 0
## B1 1 0 1 0 1 0
## B2 1 0 1 0 0 1
## B3 1 0 0 1 1 0
## B4 1 0 0 1 0 1
## B5 1 0 0 0 1 1
## C1 0 1 1 1 0 0
## C2 0 1 1 0 1 0
## C3 0 1 1 0 0 1
## C4 0 1 0 1 1 0
## C5 0 1 0 1 0 1
## D1 0 1 0 0 1 1
## D2 0 0 1 1 1 0
## D3 0 0 1 1 0 1
## D4 0 0 1 0 1 1
## D5 0 0 0 1 1 1
Provided with a flowFrame and a compatible barcoding scheme (barcode masses have to occur in the parameters the measurement data), assignPrelim will return a dbFrame containing exprs passed from the input flowFrame, a numeric or character vector of event assignments in slot bc_ids, separations between barcode populations on the normalized scale in slot deltas, normalized barcode intensities in slot normed_bcs, and the counts and yields matrices. Measurement intensities are normalized by population such that each is scaled to the 95% quantile of asinh transformed measurement intensities of events assigned to the respective barcode population.
library(CATALYST)
##
## Attaching package: 'CATALYST'
## The following object is masked _by_ '.GlobalEnv':
##
## sample_ff
re <- assignPrelim(x=sample_ff, y=sample_key, verbose=FALSE)
re
## dbFrame objectect with
## 100000 events, 10 observables and 20 barcodes:
##
## Current assignments:
## 289 event(s) unassigned
## ID A1 A2 A3 A5 A4 B3 C2 C4 C1 B2 B1 B4 C3
## Count 6087 5930 5753 5598 5440 5172 5089 5062 4982 4960 4955 4926 4888
##
## ID B5 D3 D2 C5 D4 D1 D5
## Count 4744 4739 4702 4653 4378 3834 3819
estCutoffs: Estimation of separation cutoffsAs opposed to a single and global cutoff parameter, estCutoffs will estimate a cutoff value that is specific for each sample to deal with barcode population cell yields that decline in an asynchronous fashion. Thus, the choice of thresholds for the distance between negative and positive barcode populations can be i) automated and ii) independent for each barcode. Nevertheless, reviewing the yield plots (see below), checking and possibly refining separation cutoffs is advisable.
For the estimation of cutoff parameters we concider yields upon debarcoding as a function of the applied cutoffs. Commonly, this function will be characterized by an initial weak decline, where doublets are excluded, and subsequent rapid decline in yields to zero. Inbetween, low numbers of counts with intermediate barcode separation give rise to a plateau. The separation cutoff value should be chosen such that it appropriately balances confidence in barcode assignment and cell yield. We thus fit the yields function, its first and second derivative, and compute the first turning point, marking the on-set of the plateu regime, as an adequte cutoff estimate.
re <- estCutoffs(x=re, verbose=FALSE)
re
## dbFrame objectect with
## 100000 events, 10 observables and 20 barcodes:
##
## Current assignments:
## 289 event(s) unassigned
## ID A1 A2 A3 A5 A4 B3 C2 C4 C1 B2 B1 B4 C3
## Count 6087 5930 5753 5598 5440 5172 5089 5062 4982 4960 4955 4926 4888
##
## ID B5 D3 D2 C5 D4 D1 D5
## Count 4744 4739 4702 4653 4378 3834 3819
##
## Separation cutoffs:
## ID A1 A2 A3 A5 A4 B3 C2 C4 C1 B2 B1 B4 C3
## Cutoff 0.37 0.36 0.31 0.36 0.33 0.33 0.20 0.22 0.38 0.35 0.36 0.35 0.33
##
## ID B5 D3 D2 C5 D4 D1 D5
## Cutoff 0.32 0.20 0.20 0.20 0.16 0.22 0.18
##
## Yields upon debarcoding:
## 71.98% overall yield
## ID A1 A2 A3 A5 A4 B3 C2 C4 C1
## Yield 55.82% 58.95% 65.13% 64.27% 71.12% 70.67% 73.79% 76.39% 61.54%
##
## ID B2 B1 B4 C3 B5 D3 D2 C5 D4
## Yield 73.27% 62.28% 73.02% 73.38% 75.48% 78.46% 81.09% 80.31% 83.28%
##
## ID D1 D5
## Yield 79.24% 82.09%
applyCutoffs: Applying deconvolution parametersOnce preliminary assignements have been made, applyCutoffs will render assignments final by apply the deconvolution parameters: Outliers are filtered by a Mahalanobis distance threshold, which takes into account each population’s covariance, and doublets are removed by exluding events from a population if the separation between their positive and negative signals falls below a separation cutoff. These thresholds are held in the sep_cutoffs and mhl_cutoff slots of the dbFrame. By default, applyCutoffs will try to access the sep_cutoffs in the provided dbFrame, requiring having run estCutoffs prior to this. Alternatively, a numeric vector of cutoff values or a single, global value may be specified. In either case, it is highly recommended to thoroughly review the yields plot (see above), as the choice of separation cutoffs will be determinative for debarcoding quality and cell yield.
# use global separation cutoff
applyCutoffs(x=re, sep_cutoffs=0.2)
# use population-specific cutoffs
re <- applyCutoffs(x=re)
outFCS: Output population-wise FCS filesOnce event assignements have been finalized, a separate FCS file can be written for each population by running outFCS. If option out_nms=NULL (the default), the respective population`s ID in the barcoding scheme will be used as file name. Alternatively, an ordered character vector or a 2 column CSV with sample IDs and the desired file names may be specified as a naming scheme.
outFCS(x=re, out_path=file.path(tempdir()))
plotYields: Selecting barcode separation cutoffsFor each barcode, plotYields will show the distribution of barcode separations and yields upon debarcoding as a function of separation cutoffs. If available, the currently used separation cutoff as well as its resulting yield within the population is indicated in the plot’s main title.
plotYields(x=re, which = "A5")
Option which=0 will render a summary plot of all barcodes. Here, the overall yield achieved by applying the current set of cutoff values will be shown. All yield functions should behave as described above: decline, stagnation, decline. Convergence to 0 yield at low cutoffs is a strong indicator that staining in this channel did not work, and excluding the channel entirely is sensible in this case. It is thus recommended to always view the all-barcodes yield plot to eliminate uninformative populations as a too small population size may cause difficulties, especially when computing spill estimates.
plotYields(x=re, which=0)
plotEvents: Normalized intensitiesNormalized intensities for a barcode can be view with plotEvents. Here, each event corresponds to the intensities plotted on a vertical line at a given point along the x-axis. Option which=0 will display unassigned events, and the number of events shown for a given sample may be varied via n_events. If which="all", the function will render an event plot for all IDs (including 0) with events assigned.
# event plot for unassigned events
plotEvents(x=re, which=0, n_events=1000)
# event plot for sample D4: 001011
plotEvents(x=re, which="D4", n_events=2500)
plotMahal: All barcode biaxial plotFunction plotMahal will plot all inter-barcode interactions for the population specified with argument which. Events are colored by their Mahalanobis distance. NOTE: For more than 7 barcodes (up to 128 samples) the function will render an error, as this visualization is infeasible and hardly informative. Using the default Mahalanobis cutoff value of 30 is recommended in such cases.
# biaxial plot for sample B4: 100101
plotMahal(x=re, which = "B4")
CATALYST performs compensation via a two-step approach comprising:
As in conventional flow cytometry, we can model spillover linearly, with the channel stained for as predictor, and spill-effected channels as response. Thus, the intensity observed in a given channel \(j\) are a linear combination of its real signal and contributions of other channels that spill into it. Let \(s_{ij}\) denote the proportion of channel \(j\) signal that is due to channel \(i\), and \(w_j\) the set of channels that spill into channel \(j\). Then
\[I_{j, observed}\; = I_{j, real} + \sum_{i\in w_j}{s_{ij}}\]
In matrix notation, measurement intensities may be viewed as the convolution of real intensities and a spillover matrix with dimensions number of events times number of measurement parameters:
\[I_{observed}\; = I_{real} \cdot SM\]
Therefore, we can estimate the real signal, \(I_{real}\;\), as:
\[I_{real} = I_{observed}\; \cdot {SM}^{-1} = I_{observed}\; \cdot CM\] where \(\text{SM}^{-1}\) is termed compensation matrix (CM).
Because any signal not in a single stain experiment’s primary channel \(j\) results from channel crosstalk, each spill entry \(s_{ij}\) can be approximated by the slope of a linear regression with channel \(j\) signal as the response, and channel \(i\) signals as the predictors, where \(i\in w_j\). To facilitate robust estimates, we calculate this as the slope of a line through the medians (or trimmed means) of stained and unstained populations, \(m_j^+\) and \(m_i^+\), respectively. The medians (or trimmed means) computed from events that are i) negative in the respective channels; and, ii) not assigned to interacting channels; and, iii) not unassigned, \(m_j^-\) and \(m_i^-\), respectively, are subtracted as to account for background according to:
\[s_{ij} = \frac{m_j^+-m_j^-}{m_i^+-m_i^-}\]
On the basis of their additive nature, spill values are estimated independently for every pair of interacting channels. The current framework exclusively takes into consideration interactions that are sensible from a chemical and physical point of view:
Lastly, the SM’s diagonal entries \(s_{ii}\) are set to 1 so that spill is relative to the total signal measured in a given channel. The list of mass channels that may contain isotopic contaminatons are shown below.
| Metal | Isotope masses |
|---|---|
| La | 138, 139 |
| Pr | 141 |
| Nd | 142, 143, 144, 145, 146, 148, 150 |
| Sm | 144, 147, 148, 149, 150, 152, 154 |
| Eu | 151, 153 |
| Gd | 152, 154, 155, 156, 157, 158, 160 |
| Dy | 156, 158, 160, 161, 162, 163, 164 |
| Er | 162, 164, 166, 167, 168, 170 |
| Tb | 159 |
| Ho | 165 |
| Yb | 168, 170, 171, 172, 173, 174, 176 |
| Tm | 169 |
| Lu | 175, 176 |
computeSpillmat: Estimation of the spillover matrixGiven a flowFrame of single-stained beads (or cells) and a numeric vector specifying the masses stained for, computeSpillmat estimates the spillover matrix as described above. Spill value are effected my the method chosen for their estimation, that is "median" or "mean", and, in the latter case, the specified trim percentage. The process of adjusting these options and reviewing the compensated data may thence be iterative until compensation is satisfiable.
# read in single-stained control samples
fcs_file <- system.file("extdata/ss_beads.fcs",
package="CATALYST")
ss_beads <- read.FCS(fcs_file)
# specify mass channels stained for
bc_ms <- c(139, 141:156, 158:176)
# debarcode
re <- assignPrelim(x = ss_beads, y = bc_ms, verbose = FALSE)
re <- estCutoffs(x = re, verbose = FALSE)
re <- applyCutoffs(x = re)
# compute spillover matrix
spillMat <- computeSpillmat(x=re)
estTrim: Estimation of an optimal trim valueTo optimize results achieven upon compensation, estTrim will estimate the SM for a range of trim values, and evaluate, for each barcode population, the sum over squared medians of each negative channel upon compensation. Along with an optimal trim value, the function will return a figure of population- and channel-wise median counts for each trim value. The returned value is the one that minimizes this sum. Nevertheless, it may be worth chosing a trim value that gives rise to compensated data that is centered around 0 at the cost of a higher sum of squared medians. It is thus recommended to view the diagnostic plot to check the selected value, and potentially choose another. For example, in the figure below, the minimal sum of squares is achieved with a trim value of 0.4 while 0.2 appears to be a better choice as populations are kept from highly positive or negative medians.
# estimate trim value minimizing sum of squared
# population- and channel-wise medians upon compensation
estTrim(x=re, min=0.1, max=0.5, step=0.05)
## [1] 0.4
plotSpillmat: Spillover matrix heat mapplotSpillmat provides a visualization of estimated spill percentages as a heat map. Channels not corresponding to a barcode are annotated in grey, and colours are ramped to the highest spillover value present. Option annotate=TRUE (the default) will display spill values inside each bin, and the total amount of spill caused and received by each channel on the top and to the right, respectively.
plotSpillmat(bc_ms=bc_ms, SM=spillMat)
1. E. R. Zunder et al., Palladium-based mass tag cell barcoding with a doublet-filtering scheme and single-cell deconvolution algorithm. Nat. Protocols. 10, 316–333 (2015–2AD).